Running Head : Frequency Effects in Monolingual and Bilingual Natural Reading 1 Frequency
نویسندگان
چکیده
This paper presents the first systematic examination of the monolingual and bilingual frequency effect (FE) during natural reading. We analyzed single fixations durations on content words for participants reading an entire novel. Unbalanced bilinguals and monolinguals show a similarly sized FE in their mother tongue (L1), but for bilinguals the FE is considerably larger in their second language (L2) than in their L1. The FE in both L1 and L2 reading decreased with increasing L1 proficiency, but it was not affected by L2 proficiency. Our results are consistent with an account of bilingual language processing that assumes an integrated mental lexicon with exposure as the main determiner for lexical entrenchment (Diependaele, Lemhöfer, & Brysbaert, 2013; Gollan et al., 2008). This means that no qualitative difference in language processing between monolingual, bilingual L1 or bilingual L2 is necessary to explain reading behavior. We specify this account and argue that not all groups of bilinguals necessarily have lower L1 exposure than monolinguals do and, in line with Kuperman and Van Dyke (2013), that individual vocabulary size and language exposure change the accuracy of the relative corpus word frequencies and thereby determine the size of the FE’s in the same way for all participants. Frequency Effects in Monolingual and Bilingual Natural Reading 3 Although word recognition and production are both very complex processes influenced by a wide range of variables, the frequency of occurrence of a word in a language is by far the most robust predictor of language performance (Brysbaert et al., 2011; Murray & Forster, 2004). In both word identification (e.g. Rubenstein, Garfield, & Millikan, 1970; Scarborough, Cortese, & Scarborough, 1977) and word production tasks (e.g. Forster & Chambers, 1973; Monsell, Doyle, & Haggard, 1989) high frequency words are processed faster than low frequency words. This observation is called the word frequency effect (FE), and it is one of the most investigated phenomena in (monolingual) psycholinguistics. Multiple language models of comprehension (e.g. Dijkstra & Van Heuven, 2002; McClelland & Rumelhart, 1981; Morton, 1970) explain frequency effects using implicit learning accounts. These state that repeated exposure to a certain lexical item raises this item’s baseline activation in proportion to their distance to the activation threshold, so that lexical selection of that particular word is faster during recognition (e.g. Monsell, 1991). The maximal speed of lexical access is limited, so once a word has received a certain amount of exposure, no more facilitation will be expected when there is additional exposure to that particular item (Morton 1970). In the visual domain, word recognition speed increases with the logarithm of word corpus frequency (Howes & Solomon, 1951). A certain number of additional exposures to a low frequency word will result in a large decrease of its lexical access time, while the same number of additional exposures to a high frequency word will result in a much smaller decrease of its lexical access time. This particular characteristic of the relationship between word frequency and processing time causes the size of the frequency effect to be modulated by language exposure. Bilinguals offer an interesting opportunity to study the relationship between exposure Frequency Effects in Monolingual and Bilingual Natural Reading 4 and lexical access, because of the within-subject difference in language exposure for L1 and L2. We will examine the effect of word frequency in bilingualism on the basis of new natural reading data collected for English monolinguals and Dutch-English bilinguals. We will start by examining the literature on individual differences in the word frequency effect and discuss the relation of these findings to the frequency effect in bilinguals. Following Kuperman and Van Dyke (2013), we will formulate an account of exposure-related differences in the effect of corpus word frequency that originates in the statistical characteristics of word frequency distributions. Individual Differences in the FE The collection and evaluation of frequency norms based on text corpora is central to psycholinguistic research (e.g., Brysbaert & New, 2009; Keuleers, Brysbaert & New, 2010; Van Heuven, Mandera, Keuleers, & Brysbaert, 2014). The number of exposures to a certain word is often operationalized as the count of word occurrences in language corpora like the Subtlex database (Keuleers et al., 2010). Mostly, corpus frequencies are expressed as relative values because these can be used independent of corpus size. These objective corpus word frequencies are supposed to reflect the average number of exposures to certain words of an experienced reader. While corpus word frequencies are a tremendously useful proxy measure for relative exposure, it should not be forgotten that the relative frequency of a word in a text corpus is not necessarily equal to the relative frequency of exposure to that word for a particular individual. Solomon and Howes (1951) already emphasized that word counts from text corpora are based on an arbitrary sample of the language and that there may be individual variation in the relative frequency of exposure to specific words. In other words, corpus word frequencies may underor overestimate subjective word frequencies, which can lead to a difference in the size of the FE when corpus word frequencies are used in analyses. The differences in the FE Frequency Effects in Monolingual and Bilingual Natural Reading 5 size would disappear when a measure of actual exposure or subjective frequency (e.g., Connine, Mullennix, Shernoff, & Yelen, 1990; Gernsbacher, 1984) is used. Still, in experiments where words from different semantic domains (for example tools or clothing) are used as stimuli, such differences in relative frequency would in principle not lead to systematic differences in the frequency effect between individuals. This is because differences in subjective frequency in particular semantic categories would be cancelled out by the use of stimuli from multiple domains. Next to the possibility of individual differences in the relative frequency for specific words due to differences in experience with a specific vocabulary, it is possible that individuals, who are at different stages in the language acquisition process, or, more broadly, have a differing amount of total language exposure, may have different relative frequencies for words. For this reason, some studies have used familiarity ratings of words as a more accurate reflection of the actual exposure to certain words for a specific group of readers (e.g. Balota, Pilotti, & Cortese, 2001; Kuperman & Van Dyke, 2013). Balota et al. (2001) observed that these subjective norms explained unique variance above and beyond objective corpus frequencies for lexical decision and naming tasks. Kuperman and Van Dyke (2013) confirm that objective corpus frequencies are particularly poor estimates and systematically overestimate the subjective frequencies for low frequent words for individuals with smaller vocabularies. Bilingual FE’s Most research on the frequency effect in language processing has focused on monolingual participants, while more than half of the world population, the ‘default’ person, is bilingual or multi-lingual. Taking into account that bior multilingualism is at least as widespread as monolingualism, it is important to assess how exposure to L1 or L2 affects bilingual person language processing. This is not straightforward because there is now a Frequency Effects in Monolingual and Bilingual Natural Reading 6 consensus that L1 and L2 constantly interact during visual word recognition (e.g. Duyck, Van Assche, Drieghe, & Hartsuiker, 2007;Van Assche, Duyck & Hartsuiker, 2012). These crosslingual interactions strongly suggest the existence of a unified bilingual lexicon with parallel activation for all items in that lexicon, with items competing for selection within and across languages (for a more comprehensive overview of the evidence for an integrated bilingual lexicon see Brysbaert & Duyck, 2010 and Dijkstra & Vanheuven, 2002). Not only does L1 knowledge influence L2 lexical access, but the knowledge of an L2 also changes L1 visual word recognition (e.g. Van Assche, Duyck, Hartsuiker & Diependaele, 2009). Because these interactions occur in both directions, it is not only important to assess the differential influence of word exposure on lexical access for L1 and L2 reading, but also the possible differences between the frequency effect for monolinguals and bilinguals in L1. Although the individual differences in frequency distribution described above are relevant for monolingual research, this is even more the case for bilingual research. The integrated bilingual lexicon will contain on average more lexical items than that of a monolingual. For advanced learners of an L2, who have a lexical entry for almost all concepts, we can assume that they would have almost double the amount of words in their lexicon. Inspired by observations of bilingual disadvantages in production tasks (e.g. Ivanova & Costa, 2008; Gollan, Montoya, Fennema-Notestine & Morris, 2005, Gollan et al., 2011), the weaker links theory (Gollan & Silverberg, 2001;Gollan & Acenas, 2004; Gollan et al. 2008, 2011) was proposed. This theory posits the idea that bilinguals necessarily divide their language use across two languages, resulting in lower exposure to all of the words in their lexicon, including L1 words. The lexical representations of bilinguals in both languages will have accumulated less exposure than the ones in the monolingual lexicon. Over time, this pattern of use would lead to weaker links between semantics and phonology for bilinguals, relative to monolinguals (Gollan et al. 2008). Frequency Effects in Monolingual and Bilingual Natural Reading 7 Diependaele et al. (2013) generalize the weaker links account and assume a decrease in lexical exposure for bilinguals, and suggest that this can result in a reduced lexical entrenchment either by reduced lexical precision of those representations (e.g. Perfetti, 1992, 2007), or by reduced word-word inhibition or weaker integration between phonological and semantic codes (e.g. Gollan et al., 2008, 2011). In short, the mere knowledge of a second language (and being exposed to its words) will reduce the lexical entrenchment of the first language, because this language will receive less exposure. Gollan et al. (2008) suggest a direct relationship between the weaker links and the frequency effect. They make the explicit hypothesis that bilinguals should have a larger frequency effect than monolinguals because a) bilinguals have used words in each language less often than monolinguals have and b) increased use leads to increased lexical accessibility only until a certain ceiling level of exposure, meaning that low frequency words should be more affected by differences in degree-of-use than high frequency words. From this hypothesis, we can also predict that in the case of unbalanced bilinguals, for whom L2 exposure is lower than the L1 exposure, the L2 FE’s will also be larger than the L1 FE’s. We support the idea posited by the weaker links account that differential FE’s in the bilingual domain can be explained without assuming qualitatively different language processing for bilinguals compared to monolinguals and aim to specify the hypotheses put forward by the weaker links account (Gollan et al., 2008). Word Frequency Distribution Because of the logarithmic relationship between corpus word frequency and lexical access time, it is customary to use logarithmically transformed corpus word frequencies in any analysis where word frequency is a variable in the model. This transformation changes the functional relationship between corpus word frequency and lexical access time from a Frequency Effects in Monolingual and Bilingual Natural Reading 8 logarithmic one to a linear one (See the upper and middle panel of Figure A.1 in Appendix A for an illustration). When detecting changes in the size of the FE related to language exposure, it is important to note that when these transformed corpus word frequencies are used, the size of the word frequency effect is not affected by absolute exposure. In other words, while a participant who has more exposure to a certain language will be faster to process words in that language than a participant who has little exposure to that language, an analysis based solely on transformed corpus word frequency would predict that the difference in processing times for high frequency and low frequency words, in other words the FE, is the same for both participants. Still another way of putting it is that when x and y are untransformed relative corpus word frequencies (for instance x=100 per million and y=1 per million), then for a participant who has been exposed to 100 million words the difference in absolute exposure between x and y is 9,900 (10,000-100) while for a participant who has been exposed to 10 million words, the difference is 990 (1000-10), which would lead to larger frequency effect for the participant with more exposure. When logarithmically transformed frequencies are used, for the participant with exposure to 100 million words the difference between x and y is 2 (log10 (10,000) – log10 (100) = 4 2 = 2), while for the participant with exposure to 10 million words, the difference between x and y is also 2 (log10 (1000) log10 (10) = 3 – 1 = 2). Another element to consider is that word frequency distributions are fundamentally different from normal distributions, which psychologists are used to working with. For instance, a typical characteristic of normal distributions is that the mean of a sample is an estimate that could be higher or lower than the population average and that gets more and more precise as the sample size grows. This characteristic is not shared with word frequency distributions. Instead, one of the characteristics of word frequency distributions is that the Frequency Effects in Monolingual and Bilingual Natural Reading 9 mean predictably increases as the sample, or the corpus, grows (Baayen, 2001). Importantly, Kuperman and Van Dyke (2013) show that relative word frequency is also related to the corpus size. They demonstrate that as corpus size grows, the relative frequency of low frequency words increases while the relative frequency of high frequency words stays almost constant (See Table 1). By dividing words in ten frequency bands, they show that words in the lowest frequency band (1) have an estimate of relative frequency that is twice as large in a corpus of 50 million words than in a corpus of 5 million words (ratio: 2.234); relative frequency estimates for words in the highest frequency band (10), on the other hand, were nearly equivalent (ratio: 1.003). Table 1 The ratio of a word’s relative frequency in the 50-million token SUBTLEX corpus to its relative frequency in a sample of 5 million tokens (Relative frequencies averaged over 1000 samples). Taken from Kuperman & Van Dyke (2013). It is precisely this characteristic of word frequency distributions that is overlooked in the analysis of the effect of word frequency. If the evolution of relative word frequency with more exposure follows a trajectory that is analogous to the evolution of relative frequency with increase in corpus size, this alone can account for differences in the size of the FE. On these grounds, an interaction of proficiency and corpus frequency is expected, but it should not be attributed to qualitative differences between poor and good readers, or between a categorical difference between monolinguals and bilinguals. As we already mentioned, when Frequency Effects in Monolingual and Bilingual Natural Reading 10 assuming lower exposure to all items in the lexicon and using raw corpus word frequencies in the analyses, a larger FE slope is expected. When we log transform these word frequencies we do not necessarily expect a larger FE slope as long as the ratios between the relative frequencies stay the same. The importance of changes for low frequency words but not for high frequency words is exactly what a logarithmic transformation accounts for; differences in the frequency effect due to a lower exposure to all words in the lexicon should not be found if a logarithmic transformation is used and if there are no changes in relative word frequency. However, if relative subjective frequencies do not stay constant, this difference should lead to a difference in the size or slope of the frequency effect when a logarithmic transformation is applied to the frequencies. It should be noted that the reasoning that differences in the size of the frequency effect are only due to the logarithmic relationship between word frequencies and word processing times, is therefore incomplete (e.g., Duyck, Vanderelst, Desmet & Hartsuiker, 2008; Schmidtke, 2014). Language exposure The weaker links theory is consistent with the individual differences account of Kuperman and Van Dyke (2013) in the sense that differences in the FE are attributed to the degree of exposure rather than to qualitative differences originating from the acquisition of multiple languages. However, the weaker links theory makes the general claim that a) there is an overall lower (absolute) exposure to language for bilinguals than for monolinguals and b) that this results in a larger FE for bilinguals. A pure exposure-based account leaves open the possibility that bilinguals may have the same degree of exposure to one (or, in principle, more) of their two languages as monolinguals have and this account can specify the exact locus of the modulation of the size of the FE, namely that it arises from differences in ratios of high and low relative frequencies for individuals with different levels of exposure. Frequency Effects in Monolingual and Bilingual Natural Reading 11 As already discussed, language exposure should be an important determinant of the shape and size of the FE. It is therefore of vital importance to have a good measurement for this variable. Most experiments use subjective measures like questionnaires to assess exposure, some try to quantify exposure by measuring language proficiency. Because there is a direct relation between the obtained measure of vocabulary size and the degree of exposure (e.g., Baayen, 2001), we prefer the use of a vocabulary test to assess language proficiency. By using vocabulary growth curves (see Figure 1), we can see a tight relationship between language exposure (word tokens on the x-axis) and vocabulary size (word types on the yaxis). Word tokens are counted as every word in a language corpus, including repetitions and word types are unique words. As the number of word tokens grows, so does the number of word types. Figure 1. An example of a vocabulary growth curve. This plot shows the number of word tokens encountered (on the x-axis) and the amount of encountered word types (on the y-axis) when reading the Dutch version of the novel ‘A mysterious affair at Styles’. Frequency Effects in Monolingual and Bilingual Natural Reading 12 When vocabulary size is small, the probability that the next encountered word will be a hitherto unseen type is large, but as exposure grows the probability that the next word will be a new type decreases. As a result, to double vocabulary size requires much more than twice the amount of exposure. Concurrently, the more exposure one has, the smaller the increase in vocabulary size that is associated with additional exposure. Assuming no large differences in the complexity of material that one is exposed too, a similar vocabulary score indicates similar exposure and an increase in vocabulary scores indicates a higher degree of exposure. For subjects with an equal but very high vocabulary score, it becomes more uncertain that they have the exact same amount of language exposure. Nevertheless, on the whole, when participants have equal proficiency scores, we do not expect differential FE’s, because language exposure should be quite similar. Kuperman and Van Dyke (2013) note that robust interactions between language proficiency and word frequency have been found in a wide range of studies concerning individual reading differences: More proficient readers showed a smaller frequency effect on reaction times. (For examples see Chateau & Jared, 2000 and Diependaele et al. 2013) Although this is indeed a robust finding, it must be noted that some authors have claimed that this finding might be an artifact of the base-rate effect (Butler & Hains, 1979; Faust et al. 1999; Yap et al., 2012). The base-rate effect is the observation that the magnitude of lexical effects correlates positively with reaction latencies. This would mean that the larger frequency effects for participants with a lower language proficiency score would be mainly due to the fact that their reaction times are longer than higher skilled participants. However, Kuperman and Van Dyke (2013) showed that the interaction between word frequency and language skill is still present after z-transforming reaction times per subject, thus eliminating any kind of base rate effect. Frequency Effects in Monolingual and Bilingual Natural Reading 13
منابع مشابه
Frequency drives lexical access in reading but not in speaking: the frequency-lag hypothesis.
To contrast mechanisms of lexical access in production versus comprehension we compared the effects of word frequency (high, low), context (none, low constraint, high constraint), and level of English proficiency (monolingual, Spanish-English bilingual, Dutch-English bilingual) on picture naming, lexical decision, and eye fixation times. Semantic constraint effects were larger in production tha...
متن کاملمقایسه فرآیندهای واجی در کودکان دو زبانه فارس- عربزبان و کودکان تک زبانه
Background and purpose: Bilingualism is a common phenomenon in many countries which could increase consonant errors in the speech produced by bilingual children. The aim of this study was to evaluate phonological skills such as occurrence proportion, and the frequency and type of phonological processes in Persian-Arabic speaking children in Ahvaz, Iran. Materials and methods: A descriptive-ana...
متن کاملThe Effects of Bilingualism on Basic Color Terms in Persian
This study is to determine how bilingualism could influence the list of Persian basic color terms and their order. Using a monolingual Persian and a bilingual Kurd sample students, and a color list task, it is assumed that bilingualism could change the ordering of the non-basic color terms in the second language, but not the basic ones. Another assumption is that, the old usual methods for obta...
متن کاملEye Movement Patterns in Natural Reading: A Comparison of Monolingual and Bilingual Reading of a Novel
INTRODUCTION AND METHOD This paper presents a corpus of sentence level eye movement parameters for unbalanced bilingual first language (L1) and second-language (L2) reading and monolingual reading of a complete novel (56 000 words). We present important sentence-level basic eye movement parameters of both bilingual and monolingual natural reading extracted from this large data corpus. RESULTS...
متن کاملThe Effect of Bilingualism/ Monolinguals on L2 Working Memory Capacity and Verbal Intelligence
Issues related to bilingualism and the effects which might have on language learners’ cognitive and meta-cognitive variables have attracted the attention of a couple of researchers in the field of Second Language Acquisition (SLA).Since a couple of decades ago, there has been a plethora of studies on cognitive and metacognitive differences between bilinguals and monolinguals. However, the impac...
متن کامل